perm filename VIS[0,BGB]3 blob
sn#069838 filedate 1973-11-05 generic text, type C, neo UTF8
COMMENT ā VALID 00016 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00003 00002 2.0 Computer Vision Theory.
C00005 00003 (Vision task). For me, the overall computer vision problem
C00008 00004 (turn table task). The turn table task in to construct
C00009 00005 (Vision systems). The structure of any computer vision
C00012 00006 (Bottom: the nature of images). There are three basic kinds
C00015 00007 (recognition). Recognition involves comparing
C00016 00008 (locus solving). The crux of computer vision
C00018 00009 (Computer Vision and Artificial Intelligence).
C00021 00010 (Intellectual Entities). The larger context of a vision
C00023 00011 (Fiegenbaum Quote).
C00026 00012 The Vision Transducer.
C00028 00013 Bottom: The Nature of Images.
C00030 00014 Locus Solving.
C00032 00015 Top: The Nature of Worlds.
C00034 00016 Related Vision Work.
C00037 ENDMK
Cā;
2.0 Computer Vision Theory.
2.1 Introduction to Computer Vision Theory.
In this chapter, two theories are interleaved. There is a
grand theory, which is my interpretation of the overall state of
computer vision; and there is a petit theory, which has inspired
this work. The word "theory", as used here, means simply a set of
statements presenting a systematic view of a subject. I wish to
exclude the connotations that the theory is a mathematical theory or
a natural theory. Perhaps there can be such a thing as an
"artificial theory" that lies between the philosophy and the design
of computer vision. The rest of this introduction is a synopsis of
the two theories which consist of three pairings of six parts: task
& system; bottom & top; recognition & discription.
(Vision task). For me, the overall computer vision problem
is to write a general purpose program that can see and act with
respect to the real physical world. The interest of other
researchers in modeling human perception, in participating in
traditional philosophical arguments, in solving puzzle problems or
in developing advanced automation techniques must constantly be
taken into account when discussing computer vision.
(cart task). Given a computer controlled cart,
explore and map the world.
(Cart Hardware Discription). The cart at the Stanford
Artificial Intelligence Laboratory is intended for outdoors use and
consists of four bicycle wheels, a piece of plywood, two car
battiers, a television camera, a television transmitter, and a
toy airplane radio receiver. (The vehicle being discussed is not
"Shakey", which belongs to the Stanford Reseach Institute's
Artificial Intelligence Group. There are two "Stanford-ish" A.I.
Labs and each has a computer controlled vehicle.) Logically the cart
has three motors which can be commanded to run in one or the other
direction under computer control. The six possible cart action
commands are: run forwards, run backwards, steer to the left, steer
to the right, pan camera to the left, pan camera to the right.
(turn table task). The turn table task in to construct
a 3-D model from a sequence of 2-D television images taken
of an object rotated on a turn table.
(block tasks). The classic block vision task, dating from
Roberts, consists of two parts: first convert a video image
into a line drawing; second, find a selection of
prototype blocks that account for the line drawing.
[single image vs. multiple images].
[perfect line drawing puzzles: Guzman & Waltz].
[imperfect line drawing analysis]
(Recognition tasks).
(Vision systems). The structure of any computer vision
system can be expressed as a transducer between perceived images and
a world model. The two poles of the vision transducer are called
"bottom" for images and "top" for models. Although I do not like the
vision/language analogy, I wish to adopt the top and bottom jargon
as formal vision terminology, because it is concise and widely used.
The vision transducer may be bidirectional
and visual transduction is a continuous rather than a discrete process.
1. bidirectional rather than one way.
2. continuous rather than discrete.
3. exact rather than fuzzy.
4. numerical rather than linguistic.
Computer vision is the inverse of
computer graphics. The problem of computer graphics is to synthesis
images from three dimensional models; the problem of computer vision
is to analyze images into three dimensional models.
The vision transducer has three possible modes:
verification, revelation and recognition.
Depending on circumstances, the vision transducer should be able to
run almost entirely top-down (verification vision) or bottom-up
(revelation vision). Verification vision is all that is required in
a well know and consquently predictible environment; whereas
revelation vision is required in a brand new or rapidly changing
environment.
(Bottom: the nature of images). There are three basic kinds
of information in a 2-D visual image: photometric, geometric, and
topological; also there are four kinds of 2-D images: raster,
contour, mosaic, and feature. The traditional subject of image
processing involves the study and development of programs that
enhance, transform and compare 2D images. Nearly all such image
processing work can be subsumed into computer vision.
(Top: the nature of worlds). The rules about the world that
can be assumed a priori by a programmer are the laws of physics;
programming a simulation of the mundane physical world to a given
approximation is difficult
(recognition). Recognition involves comparing
perceived data with predicted data; such recognition comparing can
be done on any of the four types of 2-D images or the 3-D models.
Arcane recognition techniques can be avoided by improving the
prediction and the analysis so that matchs are nearly obvious.
(locus solving). The crux of computer vision
is to deduce information about the world being viewed from
images of that world. I believe that the world information most
directly relevant is the physical location, extent and light
scattering properties of solid opaque objects; the location,
orientation and scales of the cameras that takes the pictures; and
the location and nature of the lights that illuminate the world.
Accordingly, three central themes of my theory are body locus
solving, camera solving, and sun solving. The macroscopic world
doesn't change very rapidly; between any two world states there is
an intermediate world state. Parallax is the principal means of
depth perception. Parallax is the alchemist that converts 2-D
images into 3-D models. Revelation vision is a process of comparing
percieved images taken in sequence and constructing a 3-D model of
the unanticipated objects.
(Computer Vision and Artificial Intelligence).
At one extreme, computer vision may be discribed as merely
the problem of getting visual input hardware properly connected to a
computer; once the computer can "see" a raster of intensities in its
memory, the rest of the problem is artificial intelligence. The
other extreme is harder to depict because it requires figuring where
to draw the line between vision software and intelligence software.
Normal vision should not be an Artificial Intelligence
problem in the sense that it will not involve searching a large
space of possibilities or of solving an abstract problems.
"The history of progress in the development of systems for automatic
symbolic integration poses an interesting question about the
definition of artificial intelligence. Few would argue that Slagle's
SAINT program was a product of artificial intelligence research.
Moses' SIN program for symbolic integration seldom needed to resort
to search, and for this reason some people consider it much more
powerful (intelligent ?) than SAINT. Now, Risch (1969) has developed
an algorithm for integrating many types of expressions. Risch
considers himself a mathematician, not an artificial intelligence
researcher. In your opinion should Risch's algorithm be considered
part of the subject matter of artificial intelligence ? If you would
exclude Risch from artifial intelligence, how would you respond to
the statement that every artificial intelligence program might
eventually be dominated by a (more intelligent?) non artificial
intelligence algorithm? If you would include Risch, would you also
include the long-division algorithm?"
- Nils J. Nilsson, problem 4-5;
Problem-Solving Methods in Artificial Intelligence.
(Intellectual Entities). The larger context of a vision
theory depends on ones' opinion about the nature of counscious
intelligent animals, men and robots. It is my opinion that mind is
to matter, as computer software is to computer hardware. That is
mind is a program that is running in the brain. Well now, what
software can account for counsciousness, the inner private life of
the self that burns in our heads ? The so called stream of
counsciousness consists of little voice(s) talking, fragments of
music playing, and most important there is the flow of the here and
now. The "here-and-now" is the totality of the particular sights,
sounds, smells, and so on that are being played in your head in
sync with the respective sensory stimuli. So I believe that
the major computation being performed by an intellectual entity in
order to stay counscious of its external world is a reality
simulation.
(Fiegenbaum Quote).
[the relation between Artificial Intellegence, experiment,
environmental simulation].
"The design, implementation, and use of the robot hardware
presents some difficult, and often expensive, engineering and
maintenance problems. If one is to work in this area solving such
problems is a necessary prelude but, more often than not,
unrewarding because the activity does not address the questions of
A.I. reseach that motivate the project. Why, then, build devices?
Why not simulate them and their environment? In fact, the SRI group
has done good work in simulating a version of their robot in a
simplified environment. The answer given is as follows. It is felt
by the SRI group that the most unsatisfactory part of their
simulation effort was the simulation of the environment. Yet, they
say that 90% of the effort of the simulation team went into this
part of the simulation. It turned out to be very difficult to
reproduce in an internal representation for a computer the necessary
richness of environment that would give rise to interesting behavior
by the highly adaptive robt. It is easier and cheaper to build a
hardware robot to extract what information it needs from the real
world than to organize and store a useful model. Crudely put, the
SRI group's argument is that the most economic and efficient store
of information about the real world is the real world itself."
- E. A. Fiegenbaum [ref. X].
The Vision Transducer.
Grand Theory: The structure of any computer vision system
can be expressed as a transducer between a bottom of perceived
images and a top, world model.
Petit Theory:
Computer vision is the inverse of computer graphics. The problem of
computer graphics is to synthesis images from three dimensional
models; the problem of computer vision is to analyze images into
three dimensional models.
(Vision loop terminolgy)
1. PREDICT 2D ā 3D synthesis Verification
2. PERCEIVE 3D ā 2D analysis Revelation
3. COMPARE recognition
(Discription of nearly pure top down vision)
(Discription of nearly pure bottom up vision)
Bottom: The Nature of Images.
Assumption: Computer vision based on digitized television images.
Alternatives: 1. Active 3-D imaging device.
2. Non-light devices: sound, radar, neutrinoes, etc.
Although, a super intellectual entities would have eyes that
could see the whole electromagnetic spectrum from gamma radiation to
direct current as well as "voices" that could broadcast on any and
all frequency; the video restriction
An image contains three basic kinds of data:
topological data, geometric data, and photometric data.
The quality of the particular computer vision system
that one is condemned to use is very likely to influence one
theories.
Visual Organ
size of image
photometric accuracy, bits per pixel
resolution
speed of image taking
Computing Organ
central processor
primary memory
secondary memory
Locus Solving.
1. Camera Locus Solving.
2. Body Locus Solving.
Silhouette Cone Intersection.
Envelope bodies.
3. Sun Locus Solving.
(compute it, look at it, shine and shadows).
Recognition.
Top: The Nature of Worlds.
Assumption: The world model should be a 3-D geometric model.
Alternatives: 1. Image memory and 2-D models.
2. Procedual Knowledge.
3. Semantic knowledge.
4. Formal Logic models.
5. Statistical world model.
(On Partial Knowledge).
Assumption: Partial knowledge should be represented by approxination.
Alternatives: 1. Tree of possibilties.
2. Multi valued logic.
3. Probablities.
(Alternate world models).
(Reality Simulation).
"For the purpose of presenting my argument I must first explain the
basic premise of sorcery as don Juan presented it to me. He said
that for a sorcerer, the world of everyday life is not real, or out
there, as we believe it is. For a sorcerer, reality or the world we
all know, is only a discription. For the sake of validating this
premise don Juan concentrated the best of his efforts into leading
me to a genuine conviction that what I held in mind as the world at
hand was merely a description of the world; a description that had
been pounded into me from the moment I was born."
- Carlos Castaneda. Journey to Ixtlan.
Related Vision Work.
Stanford Hand/Eye
SRI - hart & duda.
MIT Guzman, Waltz